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We study the asymptotic behaviour of symbolic computing systems, notably one-dimensional 
cellular automata (CA), in order to ascertain whether and at what rate the number of complex 
versus simple rules dominate the rule space for increasing neighbourhood range and number of 
symbols (or colours), and how different behaviour is distributed in the spaces of different cellular 
automata formalisms. Using two different measures, Shannon's block entropy and Kolmogorov 
complexity, the latter approximated by two different methods (lossless compressibility and block 
decomposition), we arrive at the same trend of larger complex behavioural fractions. We also 
advance a notion of asymptotic and limit behaviour for individual rules, both over initial condi- 
tions and runtimes, and we provide a formalisation of Wolfram's classification as a limit function 
in terms of Kolmogorov complexity. 
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1. Introduction 



We address several open questions formulated in [Wolfram, 2002b, 1985] concerning the distribution of 
$3 different behaviours in the space of cellular automata rules when rule spaces increase in size by number of 
states and symbols. The definite answer is ultimately uncomputable (see [Culik II &; Yu, 1988]), as a be- 
havioural characterisation of all its future states requires a full and detailed understanding of the dynamics 
of the system, something that is ultimately impossible due to the halting and reachability problems. An 
introduction to open questions in computability and cellular automata can be found in [Zenil, 2012c]. Here 
we provide a numerical approach to both the question of behavioural changes of a system over time and the 
question of the fraction of complex versus simple behaviour in a rule space, using 2 measures of complexity 
derived from algorithmic information theory and proven to have the power to characterise randomness. 
We provide statistics on fractions of complex behaviour in cellular automata when the number of symbols 
and neighbourhood ranges increase. 

2. Cellular Automata 

Let S be a finite set of symbols (or colours) of a cellular automaton (CA). A finite configuration is a 
configuration with a finite number of symbols which differ from a distinguished state b (the grid back- 
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ground) denoted by O 00 ^) 00 where b is a sequence of symbols in S (if binary then S = {0, 1}). A stack of 
configurations in which each configuration is obtained from the preceding one by applying the updating 
rule is called an evolution. Formally: 
Let /:S z ^S z andn,iGN then: 

f(n) = K x i~r • • • %i • • • %i+r) (1) 

Where / is a configuration of the CA and Tt a row with t G N and ro the initial configuration (or initial 
condition). / is also called the global rule of the CA, with A : S n —¥ S the local rule determining the values 
of each cell and r the neighbourhood range or radius of the cellular automaton, that is, the number of cells 
taken into consideration to the left and right of a central cell X{ in the rule that determines the value of 
the next cell x\. All cells update their states synchronously. Cells at the extreme end of a row must be 
connected to cells at the extreme right of a row in order for / to be considered well defined. 

2.1. Elementary Cellular Automata 

The simplest non-trivial CA rule space is the nearest neighbourhood (r = 1) one-dimensional CA with two 
possible symbols (or colours) per cell. These are called Elementary Cellular Automata (ECA) as defined 
in [Wolfram, 2002], where A : S 3 —> S. As for general CA, the set of local rules specifies the updated 
value of a site for each possible configuration. In an ECA, each site takes one of two values, conventionally 
denoted by "0" and "1" (graphically depicted as white and black cells, respectively). There are exactly 
2 = 256 rules of this type. 

2.2. Totalistic Cellular Automata 

Given that the number of rules for general cellular automata grow too fast for an increase in the number of 
k symbols (colours) and range r (neighbourhood) values, we also decided to study the behavioural fractions 
of totalistic CAs (denoted by CAt) in complementation to the limited study of general CA (starting from 
ECA) as a rule space to investigate, given that because of their rule-averaged nature their rule spaces grow 
more slowly than those of general CA rules. This is because the equation Xi- m + X{ + Xi+ n can represent 
the same value despite different combinations of X{. 

A totalistic cellular automaton is a cellular automata in which the rules depend only on the total 
(or equivalently, the average) of the values of the cells in a neighbourhood. Formally, the global rule of a 
totalistic CA can be defined by: 

f(xi) = \(xi- m + xi + x i+n ) (2) 

Where as before, x\ is a cell in the CA grid and A the local rule. A fractional ratio such as r — 2/3 means 
that the rule takes m — 2 cells to the left (including the central one) and n — 3 to the right (including the 
central one). A full introduction to these automata can be found in [Wolfram, 2002]. Like an ECA, the 
evolution of a one-dimensional totalistic cellular automaton (CAt) can be fully described by specifying 
the state a given cell will assume in the next generation based on the average value of the cells in the 
neighbourhood, consisting of the cells to its left, the value of the cell itself, and the values of the cells to the 
right, according to the neighbourhood range r. The best known, albeit two-dimensional, totalistic cellular 
automaton is Conway's cellular automaton known as the Game of Life [Gardner, 1970]. The total number 
of rules for totalistic CA is given by: 

CA T (k,r) = k^ k -^ 2r+1 ^ (3) 

With k the number of symbols (colours) and r the range or neighbourhood. 
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3. Wolfram's Classification 



Analysing the average performance of a program is a key problem in computer science. Wolfram ad- 
vanced [Wolfram, 2002] an heuristic for classifying computer programs by their space-time diagrams. Com- 
puter programs behave differently for different inputs. It is possible and not uncommon, however, to analyse 
a program with respect to a single fixed input, for example, based on a counting argument from algorithmic 
information theory. The counting argument is based on the Pigeonhole principle and tells us that most 
strings are Kolmogorov random [Li &; Vitanyi, 2008], simply because there are significantly fewer shorter 
binary programs than binary strings of a fixed size. The approach is very similar to the Incompressibility 
Method [Li & Vitanyi, 2008] and may justify the study of the evolution of a computer program from a 
random single input as an average case. However, here we are also interested in the rates of change of 
behaviour of single cases and in the comparative rates of change of all computer programs in a defined 
space. 

Evolutions of computer programs can display a variety of different qualitative properties (see, for 
example, Fig. 1), some of which display behaviour associated with pseudo-randomness even for simple initial 
configurations [Wolfram, 2002]. Others have also been proved to be capable of universal computation [Cook, 
2004; Wolfram, 2002]. 




Fig. 1. Rule 22 starting from initial configuration 1001 (Left) versus starting from 1011 (Right). 

Wolfram investigates the behaviour of cellular automata by examining their space-time evolutions 
starting with a random initial configuration. Wolfram's classes can be characterised as follows: 

• Class 1. Symbolic systems which rapidly converge to a uniform state. Examples are rules 0, 32 and 160. 

• Class 2. Symbolic systems which rapidly converge to a repetitive or stable state. Examples are rules 4, 
108 and 218. 

• Class 3. Symbolic systems which appear to remain in a random state. Examples are rules 22, 30, 126 
and 189. 

• Class 4. Symbolic systems which form areas of repetitive or stable states, but which also form structures 
that interact with each other in complicated ways. Examples are rules 54 and 110. 

Random sampling yields some empirical indications of the frequencies of different classes of behaviour 
among cellular automaton rules of various kinds. The choice of a "random" initial configuration is a way 
to capture the "average" behaviour of a system, given that if a CA behaves in some determined fashion for 
most initial configurations, there is a better chance that a "random" initial configuration will characterise 
the typical behaviour of said CA. 

There has been a significant amount of work done on the classification of cellular automata in e.g. [Lang- 
ton, 1991], [Gutowitz, 1991], [Wuensche, 1997], among others. A good survey and introduction can be 
found in [Martinez et a/., 2013]. Wuensche, for example, looked for signatures to aid searches, and statistics 
on changing portions in each class as colours and neighbourhood ranges increased. The problem here is 
that a particular initial condition may yield atypical results, so the results need to be averaged over a 
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sample of initial conditions. But an optimal definition would capture whether the system actually reaches 
a quintessential state showing a stable evolution over time. Another variable is of course the number of 
time-steps before measures start to allow the system to settle into typical behaviour, as was also explored 
in [Wuensche, 1997]. Here, however, we provide evidence that systems tend to settle into their typical 
behaviours very soon. 

Let us assume that any system s belongs to one of Wolfram's four classes Ci, C2, C3, C4. Let us define 
a recursive function W that retrieves the Wolfram class of a system s. The elements of a class C n is defined 
by W as follows: 



W(s) = lim maxW(s(i,t)) 

i,t— ^oo 

where W(s(i,t)) is the Wolfram class of system S for input i and runtime t. W(s(i,t)) and W(s) are 
approachable from below. That is, if C' n — W(S(i\t / )) > W(s(i,t)) = C n then s belongs to class C n but 
not to class C n < 4. W induces a partition given that a system s cannot therefore belong to 2 different 
Wolfram classes. That doesn't mean one cannot misclassify a system for certain values of i and t. 

We can now formally set forth the question that is of interest in this paper as follows. Because W 
induces a class of equivalence, we know that [ > \ n=1 C n — and ^ n= i\C n \ — \\J n= iC n \. What we're 
interested in in this paper is the contribution of each \C n \ to the total sum when the available resources 
(e.g. states, symbols) of computer programs grow. Our particular concern is a type of program studied by 
Wolfram himself, that is, one-dimensional cellular automata, when the number of available symbols k and 
neighbourhood range n grow. 

Given Wolfram's Principle of Computational Equivalence (PCE) and because systems in 63,4 and C\^ 
can be said to display the most widely divergent behaviour, a simplified question we ask in this paper 
concerns the fraction of systems in 63,4 with respect to C\^ for a finite set of computer programs (cellular 
automata). 

3.1. Asymptotic Behaviour of Cellular Automata 

Central to AIT is the basic definition of plain algorithmic (Kolmogorov-Chaitin or program-size) complex- 
ity [Kolmogorov, 1965; Chaitin, 1980]: 



K u (s)=mm{\plU(p) = s} (4) 

Where U is a universal Turing machine and p the program that, running on J7, produces s. Tradition- 
ally, the way to approach the algorithmic complexity of a string has been by using lossless compression 
algorithms. The result of a lossless compression algorithm applied to s is an upper bound of the Kolmogorov 
complexity of s. 

The invariance theorem guarantees that complexity values will only diverge by a constant c (e.g. the 
length of a compiler or a translation program) (see [Calude, 2002; Li & Vitanyi, 2008]). If Ui and U2 are 
two universal Turing machines, and Ku 1 (s) and Ku 2 (s) the algorithmic complexity of s for U\ and U2 
respectively, there exists a constant c such that: iK^^s) — Kjj 2 (s)\ < c. 

Hence the longer the string, the less important c is (i.e. the choice of programming language or universal 
Turing machine). K as a function from s to K(s) is upper semi-computable, meaning that one can find 
upper bounds. The shortest description of a string can be interpreted as a compressed version of itself. The 
result of a lossless compression algorithm is an upper bound of its algorithmic complexity. A compressed 
version of a string is actually a sufficient test of non-algorithmic randomness or low Kolmogorov complexity. 

Kolmogorov proposed to define the c-incompressibility of a string s if length(s) — K(s) < c, and 
in [Martin-L6f, 1966] that a string s is c-random if it cannot be selected by any possible computable tests 
at a level of precision c. 
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3.2. A definition of sensitivity to initial conditions 

Another question about cellular automata is the matter of their stability under small perturbations to 
their initial conditions. In a previous article a classification of EC As by sensitivity to initial conditions was 
introduced [Zenil, 2010]. In this article we want to look at the asymptotic behaviour of the ECAs that do 
present phase transitions. This focus was dictated by the concept of Kolmogorov complexity. The method 
produces the following variation of Wolfram's classification [Zenil, 2012]: 

• Classes 1 and 2 (Ci^)- The evolution of the system is highly compressible for any number of steps; 

• Classes 3 and 4 (63,4). The lengths of the compressed evolutions asymptotically converge to the lengths 
of the uncompressed evolutions. 

A measure based on the change of the asymptotic direction of the size of the compressed evolutions 
of a system for different initial configurations (following a proposed Gray-code enumeration of initial con- 
figurations) was presented in [Zenil, 2010]. It gauges the resiliency or sensitivity of a system vis-a-vis 
its initial conditions. The phase transition coefficient defined therein led to an interesting characterisa- 
tion and classification of systems, which when applied to elementary CA, yielded exactly Wolfram's four 
classes of system behaviour. The coefficient works by compressing the changes of the different evolutions 
through time, normalised by evolution space. It has proved to be an interesting way to address Wolfram's 
classification questions. The motivation in [Zenil, 2010] was to address one of the apparent problems of 
Wolfram's original classification, that of rules behaving in different ways when starting from different initial 
configurations. 

Rule 22, just like other ECA rules (e.g. rule 109), evolves in 2 different, clearly distinguished fashions 
(see Fig. 1). One is asymptotically compressible and the other asymptotically uncompressible. For example, 
initial configurations containing substrings of the form l n n l for n > 1, where X n is a repetition of n times 
the bit X, trigger a Class 3 (random looking) behaviour (see Fig. 5), which then dominates over time, with 
no qualitative change the longer the system runs (see Fig. 3). However, evolutions starting from symmetric 
initial configurations of odd length cannot lead to non-symmetric configurations after application of the 
local rules of ECA rule 22 hence only leading to Class 2 behaviour (periodic) leaving the system in full 
equilibrium. On the other hand, for evolutions of rule 22 out of equilibrium, the probability of moving 
from a disordered state to a symmetrical one can be calculated by the size of the evolution of the cellular 
automaton from which the chances to reach a symmetric configuration decays exponentially with the ECA 
runtime given that the number of symmetric configurations are outnumbered by the number of random 
strings from a simple counting argument. 




init configuration x 10 



init configuration x 100 



Fig. 2. Asymptotic behaviour of rule 22. Left: First 200 initial conditions in intervals of 10. Right: Initial conditions 200 to 
1000 in intervals of 100. The uncompressibility ratio grows for longer initial configurations. 



Initial conditions that lead to different behaviours remain stable; no behavioural changes occur between 
the 2 groups of asymptotically compressible and uncompressible behaviour. By applying regression analysis 
over Fig. 3 approaching the 2 curves corresponding to the compressed versions of the behaviours of rule 22, it 
can be seen that the partial derivatives of the calculated fitted curves are either constant for uncompressible 
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Class 3 behaviour or for compressible Class 2 behaviour. Hence they are stable over time. 





0.30 


class 3 




0?S 




n 




class 2 


■H 






^' 






■n 


0.20 




u 






£ 






u 


01^ 




■H 






r-_ 






".■". 






,-■ 




tfWS^^ A* 


^j 


u.lu 




n 






= 




<n^ «i^irttvrvi^irtfcfl\^ ^ijifart^^iMMirflyftQi^iMlltfUlJ JMWL 


_: 




~ » ■■ — *^^^ ■ ^^ 


•._.' 


0.05 





100 200 

runtime X 20 



Fig. 3. Behavioural bi-stability of Rule 22 over the first 20 initial configurations according to the Gray-code numbering 
scheme, running for 10 steps each, and compressed every 500 steps, representing a point. The rule's behaviour is qualitatively 
the same according to its information-content measures by compressibility. 



3.2.1. Initial configuration numbering scheme and a distance metric 

Ideally, one should feed a system with a natural sequence of initial configurations of gradually increasing 
complexity, so that qualitative changes in the evolution of the system are not attributable to large changes 
in the initial conditions. 

More precisely, Gray codes minimise the Damerau-Levenshtein distance between strings. The Damerau- 
Levenshtein distance between two strings u and v gives the number of one-element deletions, insertions, 
substitutions or transpositions required to transform u into v. 

The simplest, not completely trivial, initial configuration of a cellular automaton is the typical single 
black cell on an "empty" (or white) background. An initial configuration is the region that must be varied, 
consisting only of the non- white portion of a system. For example, the initial configuration 010 is exactly the 
same as 1 because the cellular automaton background symbol is zero. Therefore valid initial configurations 
for cellular automata should always be wrapped in l's as they are in Fig. 4. 
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Fig. 4. First 50 initial configurations based on a Gray-code enumeration for binary CAs. 



Gray codes exist for non-binary strings. These are n-ary Gray codes that allow us to generate initial 
conditions for computing systems such as cellular automata with more than 2 states or colours. 

In Fig. 5 one observes 28 of the first 50 initial configurations leading to simple periodic evolutions, 
which amounts to a 56% chance. However, many are produced from the same pattern (symmetrical inputs) 
and several others are mirror evolutions. Only 16 are truly different, that is, 32% of the first 50. Hence rule 
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Fig. 5. Sensitivity of Rule 22 to initial configurations under the Gray-code initial configuration scheme. Each evolution 
is preceded by the initial configuration and followed by the losslessly compressed (Deflate) length. Differences in patterns 
generated by EC A Rule 22 induced by gradual changes following the Gray-code based enumeration for CA, with each evolution 
running for 125 steps. 



22 is 32:50 sensitive, unlike, for example, rule 30, which, like rules such as rule 0, is highly robust, whereas 
rules such as rule 110 fall in the middle, being moderately sensitive. A systematic analysis of sensitive, 
moderate and robust systems may be found in [Martinez et al., 2013]. 



8 H. Zenil and E. Villarreal- Zapata 

4. Measures of information content and algorithmic complexity 

We aim to address the question of the behaviour of a system using a definition of asymptotic and limit 
behaviour that is cognizant of Wolfram's classification problem (systems which may behave differently for 
different initial configurations), and that is unlike other approaches (e.g. see [Langton, 1991] or [Wuensche, 
1997, 2005]). We follow 2 different methods. One is based on the traditional Shannon entropy, which has the 
advantage of being computable but lacks [Griinwald Sz Vitanyi, 2004] the power of a universal measure of 
complexity, while the other is based on Kolmogorov complexity, which has the advantage of the full power 
of a universal complexity measure ([Chaitin, 1980], [Martin-L6f, 1966], [Schnorr, 1971] and[Gacs, 1974]) 
but is semi-computable. To approximate Kolmogorov complexity too we used 2 different, complementary 
approaches, one being the traditional method using lossless compression algorithms and the other the 
Coding Theorem Method as defined in [Delahaye & Zenil, 2012; Soler-Toscano et a/., 2012]. 

4.1. Normalised Block Shannon Entropy 

A probabilistically-based complexity measure is given by Block entropy, which, as suggested in [Wolfram, 
2002], generalizes the concept of Shannon entropy [Shannon, 1948] to blocks oi NxN symbols. A normalised 
version of the Block entropy for block size TV is proposed below: 

J2r^n\ E(r)/log 2 (N xN) 
H N (g) = ^ rGMiVxiV y * 2V (5) 

where {g}NxN is the object g decomposed into square blocks of length N and E(r) the Shannon entropy 
of each block r (with boundary conditions). For N = 1, for example, block entropy is simply the standard 
unigram entropy. For N = 2, it is the entropy of 2-dimensional arrays of size NxN. 

This measure is used as a computable, if limited, complexity measure to provide a first indication, 
before proceeding to estimate Kolmogorov complexity in a bid to quantify fractions of behavioural changes 
in increasingly larger spaces of cellular automata. 

4.2. Normalised (lossless) Compression 

We are also interested in the following information-theoretic Normalised Compression (NC) measure: 

NC(s) = K(s)/length(s) (6) 

Where K(s) is the Kolmogorov complexity of s. Given that K is lower semi-computable we replace K by 
the values retrieved by a general lossless compression algorithm C for which compressed lengths of s are 
upper bounds of K. Large compression ratios NC therefore represent a sufficient test of low Kolmogorov 
complexity. Because compression algorithms can retrieve values larger than the uncompressed original 
objects we use the following variation: 

NC(s) = C(s)/ max(C(s), length(s)) (7) 

We distinguish complex from simple behaviour in CA by means of the lossless compressibility ap- 
proximation of Kolmogorov complexity. We consider a cellular automaton to be complex or simple if the 
compressed lengths tend to compression ratio NC or 1. Formally, a system s(i,t) is said to be complex 
for initial configuration number z, following a Gray code, and time t, if: 

lim C(s(i,t)) = \s(i,t)\ (8) 

i,t— ^oo 

The question of the asymptotic behaviour of a cellular automaton s is therefore the question of whether 
lim^oo NC(s(i)) = 1 for complex behaviour, or for simple behaviour. We will use NC to determine the 
Wolfram class of a CA s. If NC(s) « 1 then W(s) = 63^4; otherwise W(s) = C\^- However, we need a 
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Table 1. The 18 most complex ECAs ac- 
cording to the BDM (including the 16 
Class 3 and 4 ECAs according to Wolfram's 
classification) sorted by BDM and Block 
entropy and BDM normalised by lossless 
compression ratio. The agreement among 
the 3 is very high. The Pearson correla- 
tion coefficient between Block entropy and 
lossless compression NC values is 0.936, 
0.80 between NC and BDM, and 0.86 be- 
tween BDM and Block entropy (see Fig 6). 



ECA 


Block 


compression 


BDM 


Rule no. 


entropy 


ratio (NC) 


value 


30 


0.02774 


1.00 


0.681 


45 


0.02705 


1.00 


0.642 


54 


0.02462 


0.799 


0.2225 


60 


0.02430 


0.783 


0.1273 


73 


0.02398 


1.00 


0.3515 


90 


0.02363 


0.786 


0.1920 


110 


0.02329 


1.00 


0.3813 


150 


0.02320 


0.978 


0.2585 


94 


0.02285 


0.742 


0.2416 


105 


0.02270 


1.00 


0.2728 


146 


0.02234 


0.841 


0.2990 


126 


0.02223 


0.844 


0.3200 


18 


0.02199 


0.836 


0.2757 


22 


0.02177 


0.878 


0.3279 


122 


0.02135 


0.809 


0.2424 


62 


0.01919 


0.969 


0.2709 


106 


0.01766 


0.599 


0.2801 


41 


0.01744 


0.910 


0.1832 



numerical approximation of NC(s), which means evaluating NC(s(i,t) for a number of initial conditions 
i following the Gray-code numbering scheme, and for a runtime t. Therefore the approximated Wolfram 
class would be W(s(i,t)). 

An interesting example is the elementary cellular automaton with rule 22, which behaves as a fractal 
in Wolfram's class 2 for a certain segment of initial configurations, followed by phase transitions of more 
disordered evolutions typical of class 3 for certain other initial configurations. One question therefore is 
whether it is class 2 or class 3 behaviour that dominates rule 22 for a greater length of time and a larger 
number of initial configurations. An analytical answer provides a clear clue: only highly symmetric initial 
configurations produce fractal-like behaviour (Class 2), but the longer the string the smaller the fraction 
of highly symmetric strings, therefore the more unlikely the fractal behaviour. Empirically, as we will 
quantify using the NC measure, the fraction of non-compressed evolutions of rule 22 increases non-linearly 
(apparently O(log)) as the number of initial configurations increase with NC slowly converging to 1. 

There is significant agreement between the classifications produced by Shannon entropy, block com- 
pression (NC) and algorithmic probability (BDM), these last two being approximations of Kolmogorov 
complexity. Table 1 depicts the classification order for each of the measures. 

4.3. Block Decomposition Method 

Another method for approximating the Kolmogorov complexity of d-dimensional objects was advanced 
in [Zenil et al., 2012d]. The method is called Block Decomposition (BDM) and is based on the Coding 
Theorem Method [Delahaye & Zenil, 2012; Soler-Toscano et al, 2012] (CTM) rooted in the relation es- 
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40 60 

EGA rules 



Fig. 6. Agreement between the complexity values for the 88 non-equivalent ECA according to Block entropy, NC and BDM. 
Rules sorted from smallest to largest NC value. Full ECA tables are given in Figs. A.l, A. 3 and A. 2 in the Appendix. 

tablished by algorithmic probability between the frequency of production of a string and its Kolmogorov 
complexity. The more frequent the less random, according to the Coding theorem: 



D(s) = E p:U{p)=s l/2M 



(9) 



That is, the sum over all the programs for which U with p outputs s and halts on a (prefix-free) universal 
Turing machine U. 

Then we define the Kolmogorov complexity K m of an object s as follows: 

K m (s) = -log 2 (B(a)) (10) 

Where D(s) is the frequency of s as calculated in [Zenil et al., 2012d]. The BDM is then defined by: 



K m( s (h *)) = Yl lo g2(^) + K m (r u ) 

(r u ,n u )es(i,t) dxd 



(ii) 



Where r u are the different square arrays in the partition matrix s(i,t)dxd an d n the number of r u square 
arrays d x d from the space-time evolution of a CA s(i,t)dxd- 

A normalised version of the BDM will be used, defined as follows: 



K' m (s(i,t))/\s(i,t)\ 



(12) 



Where |s(i, i)\ is the size (length x width) of the space-time diagram produced by the CA s for initial 
configuration i and runtime t, this latter being the length. The reader should simply bear in mind from 
now on that K^ (from now on also only denoted by K m ) is the approximation of Kolmogorov complexity 
by the Block Decomposition Method (BDM). 

The BDM is used as a complement and an alternative to lossless compression, given that compression 
algorithms do not deal effectively with small objects. More on algorithmic probability and the Coding 
theorem can be found in [Cover & Thomas, 2006; Calude, 2002], while more on applying the Coding 
theorem to numerically approximate Kolmogorov complexity and BDM can be found in [Delahaye & Zenil, 
2012; Soler-Toscano et a/., 2012; Soler-Toscano et al., 2012b; Kolmogorov, 1965]. 
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5. Estimation of fractions of Wolfram classes in CA rule spaces 

The behavioural classification of all ECAs according to the current Wolfram classification, as given in [Wol- 
fram, 2002], was possible after only 8 initial configurations. That is, all ECAs reached their assumed Wol- 
fram class W(s) using the measure NC(s) after exploring only 8 initial configurations according to the 
Gray-code enumeration scheme. 

The universality of Kolmogorov complexity as a measure of complexity guarantees that if a cellular 
automaton has regularities these regularities will eventually be detected. This is why the formalisation 
of Wolfram's classes in a mathematical function-given in terms of a limit proposed in Section 3-provides 
lower bounds on their complexity and is well defined. 
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Fig. 7. Asymptotic behaviour of Rule 22: Monotonic increase of number and percentage of uncompressed evolutions when the 
order of magnitude is exponentially increased. Each bin contains (10 — 10 )/1000 rule evolutions starting from different 
initial configurations in the interval 10 to 10 according to the Gray-code enumeration. Fractions of complex behaviour 
versus simple behaviour for different initial configurations range from 0.9 to 0.9571. 



The distributions in Fig. 8 show many interesting details of the (un) compressible evolutions of each 
of the 88 non- equivalent ECAs under left-right reflection, exchanging states and 1, or by performing 
these two transformations sequentially. For example, rules in Wolfram's Class 4 are negatively skewed, 
indicating an asymptotic behaviour towards incompressibility. Class 2, however, produces more Gaussian- 
like evolutions with high mean compressibility, while almost all the evolutions of rules in Class 1 are 
maximally compressible. Rules known for their random (or chaotic) behaviour in Class 3 show maximal 
incompressibility, which is consonant both with what is known and with their Wolfram class. 

Rules 30 and 45 show maximum incompressibility and therefore have a high estimated Kolmogorov 
complexity. Rule 110 displays the maximum asymptotic incompressibility and rules 41, 60, 62, 73 and 94 
show a high degree of incompressibility. NC characterises three other EC A rules (rules 62, 73 and 94) 
whose histograms of their uncompressed evolutions suggest they are not obviously in classes 1 or 2 (hence 
potentially in either classes 3 or 4), therefore the rest 70 EC A rules would belong to classes 1 or 2. Tables 2 
and 3 summarise these results. 

For general CA rules the values shown in Table 4 obtained using the normalised Block Decomposition 
Method (see Eq. 12) to approximate Kolmogorov complexity are consistent with the results from Shannon 
entropy (Table 5 and Table 6) and compressibility (Table 7). Space-time diagrams were found to show 
increasingly larger fractions of higher entropy and algorithmic randomness according to all these complexity 
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Table 2. Statistical properties from the histograms 
of the 18 ECA rules in (see Fig. 8) that bring 
all 15 ECA rules in classes 3 and 4 according to 
Wolfram. Rules 30 and 45 have maximal complexity 
and therefore indeterminate (i) skewness. The skew- 
ness, distribution mean and standard deviation pa- 
rameters that characterise all ECA rules in Wol- 
fram's classes 3 and 4 according to Wolfram 2002 
is given by mean greater than 0.59 or, skewness 
smaller than -0.87 and standard deviation greater than 
0.55, or standard deviation or greater than 0.1. 



ECA 


skewness 


mean 


standard 


Wolfram 


Rule no. 






deviation 


Class 


18 


0.319 


0.619 


0.099 


3 


22 


-0.78 


0.854 


0.203 


3 


30 


i 


1 





3 


41 


-0.515 


0.723 


0.146 


4 


45 


i 


1 





3 


54 


0.329 


0.598 


0.149 


4 


60 


-1.12 


0.58 


0.0507 


3 


62 


0.602 


0.657 


0.037 


2 


73 


-1.164 


0.955 


0.0596 


2 


90 


-0.728 


0.614 


0.0597 


3 


94 


-0.085 


0.621 


0.145 


2 


105 


-1.089 


0.927 


0.0550 


3 


106 


-0.875 


0.44 


0.121 


2 


110 


-3.1 


0.995 


0.0116 


4 


122 


-0.544 


0.65 


0.106 


3 


126 


0.148 


0.64 


0.0772 


3 


146 


0.409 


0.63 


0.0848 


3 


150 


-1.004 


0.834 


0.0589 


3 



Table 3. Suggested behavioural classification of the 88 no n- symmetric 
ECA rules according to the statistical properties of the NC distri- 
butions (see Fig. 8) for each class and the logical/statistical relation 
given in Table 2 that retrieves all Wolfram's class 3 and 4 EC As ac- 
cording to Wolfram and which brings 3 other ECA rules that ac- 
cording to the NC histograms are not obviously in classes 1 or 2. 



Wolfram 
Class 



Non-symmetric ECA rules 



C 3 , 4 : 18, 22, 30, 41, 45, 54, 60, 62, 73, 90, 94, 105, 106, 110, 122, 126 

146, 150 

Ci, 2 : 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 23, 24, 25, 26, 

27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 40, 42, 43, 44, 46, 50, 51, 56, 
57, 58, 72, 74, 76, 77, 78, 104, 108, 128, 130, 132, 134, 136, 138, 
140, 142, 152, 154, 156, 160, 162, 164, 168, 170, 172, 178, 184, 
200, 204, 232 



measures. 

Table 7 shows the results of the exhaustive search in which a rule was considered incompressible (and 
therefore complex) if at least one initial configuration led to a space-time evolution asymptotically incom- 
pressible among the first 50 steps. The rule space for increasing k grows too fast, so we proceeded by 
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Fig. 8. Histograms of uncompressible evolutions (according to NC) of all the 88 non- equivalent EC As for 1000 initial con- 
figurations following the Gray-code enumeration scheme, each running for t — 100 steps. In keeping with what is known 
about EC A, examining the distribution shapes in these histograms allows one to ascertain which ones are complex and to 
which Wolfram class they belong. For example, rules known to be in the same Wolfram class exhibit similar compressibility 
distributions. 



sampling for an initial segment of k = 3. Fig. 10 and Fig. 9 show the main results of the complexity experi- 
ments. For each evolution the compression ratio (NC) was calculated. Those CA with an evolution having 
a compression ratio greater than 1/2 were considered incompressible for at least one initial configuration 
up to the number of time steps chosen. Below that threshold CAs were considered simple. 

The rate of convergence to 1 of CAt seems to follow a quadratic equation. Given that rule spaces are 
proper supersets of smaller rule spaces, the number of complex rules cannot be 1, but Fig. 10 shows the 
growth rate of the function and the speed of its apparent asymptotic movement towards 1. 

When sets were not exhausted, as reported in Table 7, sample sizes were calculated by running several 
samples of increasing size. A sample of 0.005 the size of the original rule space gave us a complex/simple 
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Table 4. Complexity ratios according to Kolmogorov 
complexity approximated by the normalised Block Decom- 
position Method (NBDM) based on algorithmic probabil- 
ity (the Coding theorem). Increasing r led to a greater 
fraction of complex rules in the case of rules with mean val- 
ues larger than 0.20 NBDM among the first 20 initial con- 
figurations according to the Gray-code enumeration, with 
each rule running for t = 50 steps. Where we proceeded 
by sampling, sample sizes are indicated in parentheses. 



CA NBDM for Total number of rules 

range (r) k = 2 (2 22r+1 ) 

1 0T4 256 (ECA) 
3/2 0.3241 65 536 

2 0.5206(10 000) 4 294 967 296 

5/2 0.795 (10 000) 18 446 744 073 709 551 616 



Table 5. Block Shannon entropy increases 
for increasing r among totalistic CA 
(CAt) rules. Block Shannon entropy is av- 
eraged over the first 20 initial configura- 
tions according to the Gray-code enumera- 
tion, each rule running for t = 50 steps. 
Only rules with Block Shannon entropy 
larger than .20 are counted as complex. 



CA T 




Number of rules 


range (r) 


fc = 2 


(k^ 


-l)(2r+l) + l\ 


1 


0.125 




16 


3/2 


0.25 




32 


2 


0.39 




64 


5/2 


0.4375 




128 


3 


0.53125 




256 


7/2 


0.568 




512 


4 


0.6054 




1024 


9/2 


0.6279 




2048 


5 


0.6547 




4096 


11/2 


0.676 




8192 


6 


0.694 




16 384 


13/2 


0.708 




32 768 



ratio with 2 precision digits. Increasing the sample size by 10, hence by 0.05 (or 5%) of the total rule space 
size, increased the accuracy by only an extra decimal digit. It was therefore decided that a size of 0.005 
provided enough statistical confidence. 

6. Characterisation of Wolfram's Classification in terms of Algorithmic 
Randomness 

The present study offers a new and complementary formulation of Wolfram's classes, producing a number 
of interesting properties to look into. The following is a characterisation of Wolfram's classes in terms of 
asymptotic behaviour and (un) compressibility based on algorithmic information theory: 

• Class 1. Evolution has asymptotic compressibility ratio equal to (lowest Kolmogorov complexity). 

• Class 2. Evolution has compressibility ratio smaller than 1/2 (low Kolmogorov complexity). 
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Table 6. Block Shannon entropy also 
increased for k = 3 among total- 
istic CA rule spaces. Block Shannon 
entropy is averaged over the first 20 
initial configurations according to the 
Gray-code enumeration, each rule run- 
ning for t = 50 steps. Only rules 
with Block Shannon entropy larger 
than .20 are counted as complex. 



CAt Number of rules 

range (r) k = 3 ( fc (fc-i)(2r+l)+l ) 

r = 1 0.329 2187 

3/2 0.504 19 683 

2 0.605 177147 



Table 7. Ratios of compressibility /uncompressibility in totalistic CA<-p 
rule spaces of neighbourhood r and number of symbols k. Increas- 
ing both r and k increases the fraction of uncompressible evolu- 
tions among the first 20 initial configurations according to the Gray- 
code enumeration, with each running for t = 50 steps. Where we 
proceeded by sampling, sample sizes are indicated in parentheses. 



CAt range (r) 


k = 2 


k = 3 


1 


6/16 = 0.375 


1059/2187 = 0.484 


3/2 


15/32 = 0.468 


672/1000 = 0.672 (19 683) 


2 


28/64 = 0.4375 


658/886 = 0.742 (177147) 


5/2 


63/128 = 0.492 


6487/7972 = 0.813 (1594 323) 


3 


146/256 = 0.57 




7/2 


313/512 = 0.611 




4 


665/1024 = 0.649 




9/2 


1398/2048 = 0.682 




5 


2911/4096 = 0.7 




11/2 


6041/8192 = 0.737 




6 


12 461/16 384 = 0.76 




13/2 


25 562/32 768 = 0.78 





• Class 3. Evolution has compressibility ratio 1. (uncompressible randomness) 

• Class 4. Evolution has asymptotic compressibility ratio equal to 1. (asymptotic uncompressible random- 
ness) 

This is useful in determining the difficulty of deciding the Wolfram class of a dynamic system, implying 
that Wolfram's classification is itself uncomputable in this formal version, a direct inference from the 
uncomputability of K. 

7. Conclusions 

We have studied the asymptotic behaviour of cellular automata both for initial configurations and run- 
times, as well as the fractions of Wolfram classes for increasingly larger rule spaces of computer programs, 
specifically cellular automata. Among the results, we confirmed preliminary findings reporting [Packard 
&; Wolfram, 1985] a preponderance of complex classes 3 and 4 as the number of symbols (colours) and 
neighbourhood ranges of these systems increase. An open question is whether the same trend persists for 
larger dimensions (or other computing models) although these authors have no evidence to believe that 
the trend will be different. 
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k=3 




Fig. 9. Left: Block Shannon entropy progression in different rule spaces for increasing r and k. Right: Kolmogorov complexity 
progression for k = 2 according to the NBDM based on algorithmic probability. For k = 3, r = 5/2 already has a larger fraction 
of complex systems than k = 2, r = 13/2. 




• k=2 
■ k=3 



Fig. 10. Plot of compressibility ratios (NC) in different rule spaces for increasing k and r in totalistic CA rules spaces for 
k — 2 and 3. The curve that fits the data points for k — 2 is a logarithmic line. 



We have shown that behaviour is for the most part stable for a given initial condition but that 
complexity increases in terms of the number of both the initial configurations and rules spaces. The 
rate at which it does so is fast before asymptotically slowing down approaching compressibility ratio 1 
(uncompressibility) , growing by log(r^l//c)) after regression analysis, with r the neighbourhood and k the 
number of colours. Similar results were obtained by using two different measures. One was Shannon's 
entropy in the form of a block entropy version normalised and applied to CA space-time diagrams. The 
other measure was Kolmogorov complexity approximated by 2 different alternative methods, the traditional 
lossless compression algorithm and the Coding theorem based on algorithmic probability, from which a 
Block decomposition technique was devised. 

While the Shannon entropy result may be less surprising because the number of possible configurations 
with more symbols and larger neighbourhood ranges suggests an increase in entropy, the fact that the 3 
approaches used to measure complexity in rule spaces (including estimations of a universal and objective 
measure such as Kolmogorov complexity) yielded the same results provides a robust answer to the question 
regarding the change of fractions of behaviour and the rate of this change in computer programs for 
increasing resources where complexity is clearly seen to dominate asymptotically. 
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Fig. A.l. The 88 different EC A sorted from largest to smallest mean Block entropy over the first 20 initial configurations 
according to the Gray-code enumeration, normalised by t x N with N as the chosen Entropy block size (in this case 3 and 
4) and t — 50 the runtime. Each ECA is shown here starting from a random initial configuration. 
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Fig. A. 2. The 88 different EC A sorted from largest to smallest mean lossless compression over the first 20 initial configurations 
according to the Gray-code enumeration. For example, rules 110 (124), 105, 73 (109), 45 (75, 89, 101) and 30 (86) are among 
the less compressible (hence the most algorithmically complex). Each EC A is shown here starting from a random initial 
configuration. 
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Fig. A. 3. The 88 different ECA sorted from largest to smallest normalised mean Block Decomposition (BDM) over the first 
20 initial configurations according to the Gray-code enumeration. 
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